- Title
- Reinforcement Learning Based Multi-Agent Resilient Control: From Deep Neural Networks to an Adaptive Law
- Creator
- Hou, Jian; Wang, Fangyuan; Wang, Lili; Chen, Zhiyong
- Relation
- Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21). Proceedings of the 35th AAAI Conference on Artificial Intelligence / 33rd Conference on Innovative Applications of Artificial Intelligence / 11th Symposium on Educational Advances in Artificial Intelligence, Volume 9A (Online 02-09 February, 2021) p. 7737-7745
- Relation
- https://doi.org/10.1609/aaai.v35i9.16945
- Publisher
- Association for the Advancement of Artificial Intelligence
- Resource Type
- conference paper
- Date
- 2021
- Description
- Recent advances in Multi-agent Reinforcement Learning (MARL) have made it possible to implement various tasks in cooperative as well as competitive scenarios through trial and error, and deep neural networks. These successes motivate us to bring the mechanism of MARL into the Multi-agent Resilient Consensus (MARC) problem that studies the consensus problem in a network of agents with faulty ones. Relying on the natural characteristics of the system goal, the key component in MARL, reward function, can thus be directly constructed via the relative distance among agents. Firstly, we apply Deep Deterministic Policy Gradient (DDPG) on each single agent to train and learn adjacent weights of neighboring agents in a distributed manner, that we call Distributed-DDPG (D-DDPG), so as to minimize the weights from suspicious agents and eliminate the corresponding influences. Secondly, to get rid of neural networks and their time-consuming training process, a Q-learning based algorithm, called Q-consensus, is further presented by building a proper reward function and a credibility function for each pair of neighboring agents so that the adjacent weights can update in an adaptive way. The experimental results indicate that both algorithms perform well with appearance of constant and/or random faulty agents, yet the Q-consensus algorithm outperforms the faulty ones running D-DDPG. Compared to the traditional resilient consensus strategies, e.g., Weighted-Mean-Subsequence-Reduced (W-MSR) or trustworthiness analysis, the proposed Q-consensus algorithm has greatly relaxed the topology requirements, as well as reduced the storage and computation loads. Finally, a smart-car hardware platform consisting of six vehicles is used to verify the effectiveness of the Q-consensus algorithm by achieving resilient velocity synchronization.
- Subject
- (deep) neural network algorithms; reinforcement learning; resilient; multi agent systems
- Identifier
- http://hdl.handle.net/1959.13/1450618
- Identifier
- uon:43985
- Language
- eng
- Reviewed
- Hits: 963
- Visitors: 957
- Downloads: 0